Variant Discovery ◾ 133
4.2.2.2.4 Converting SAM files into BAM files
We must convert SAM files into BAM files to save storage space. Moreover, BAM files can
be manipulated faster. The following script creates the directory “bam” and converts the
SAM files into BAM files:
mkdir bam
cd sam
for i in $(ls *.sam | rev | cut -c 5- | rev);
do
samtools view -uS -o ../bam/${i}.bam ${i}.sam
done
cd ..
4.2.2.2.5 Sorting and indexing alignments in the BAM files
The alignments in BAM files are to be sorted by chromosomes in the reference genome to
be used in the downstream analysis. The following bash script uses samtools to sort and
index the BAM files and stores them in a new directory called “sortedbam”:
mkdir sortedbam
cd bam
for i in $(ls *.bam);
do
samtools sort -T ../sortedbam/tmp.sort -o ../sortedbam/${i} ${i}
samtools index ../sortedbam/${i}
done
cd ..
4.2.2.2.6 Extracting a chromosome or an interval
Most of the time, we may be interested in the identification of variants on the whole genome.
However, sometimes the study may focus on a specific chromosome or an interval of the
genome. In case the target is the variants of the whole genome, you can skip this step. You
should remember that identifying variants from whole genome requires large memory and
storage space. Therefore, for demonstration and the sake of simplicity, we will focus only
on chromosome 21 of human genome. In the following, we will create a directory “chr21”
and use samtools to extract the alignments of chromosome 21 and store the BAM files and
sort them:
mkdir chr21
cd sortedbam
for i in $(ls *.bam|rev|cut -c 5-|rev);
do
samtools view -b ${i}.bam chr21 > ../chr21/${i}.bam
samtools index ../chr21/${i}.bam
done
cd ..